226 research outputs found

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning

    Get PDF
    Today, data availability has gone from scarce to superabundant. Technologies like IoT, trends in social media and the capabilities of smart-phones are producing and digitizing lots of data that was previously unavailable. This massive increase of data creates opportunities to gain new business models, but also demands new techniques and methods of data quality in knowledge discovery, especially when the data comes from different sources (e.g., sensors, social networks, cameras, etc.). The data quality process of the data set proposes conclusions about the information they contain. This is increasingly done with the aid of data cleaning approaches. Therefore, guaranteeing a high data quality is considered as the primary goal of the data scientist. In this paper, we propose a process for data cleaning in regression models (DC-RM). The proposed data cleaning process is evaluated through a real datasets coming from the UCI Repository of Machine Learning Databases. With the aim of assessing the data cleaning process, the dataset that is cleaned by DC-RM was used to train the same regression models proposed by the authors of UCI datasets. The results achieved by the trained models with the dataset produced by DC-RM are better than or equal to that presented by the datasets' authors.This work has been also supported by the Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    A new predictive neural architecture for solving temperature inverse problems in microwave-assisted drying processes

    Get PDF
    In this paper, a novel learning architecture based on neural networks is used for temperature inverse modeling in microwave-assisted drying processes. The proposed design combines the accuracy of the radial basis functions (RBF) and the algebraic capabilities of the matrix polynomial structures by using a two-level structure. This architecture is trained by temperature curves, TcðtÞ; previously generated by a validated drying model. The interconnection of the learning-based networks has enabled the finding of electric field (E) optimal values which provide the TcðtÞ curve that best fits a desired temperature target in a specific time slo

    New details about the frequency behavior of irradiated bipolar operational amplifiers

    Get PDF
    The frequency behavior of a bipolar operational amplifier (op amp) is always expected to worsen when the device is irradiated. In other words, parameters like the slew rate and the gain-bandwidth product are to decrease after either neutron or gamma tests. However, some neutron and TID tests performed on a large variety of bipolar op amps have shown that the evolution of the frequency behavior is not as simple as it is usually believed. In fact, there is evidence of an increasing influence of the power supply values on the former parameters, which can be extremely important in some devices. Also, the relationship among different frequency parameters has been investigated and, finally, an interesting and scarcely reported phenomenon is depicted. This phenomenon is the appearance of spontaneous oscillations in fed-back op amps, without doubt related to the modification of the gain and phase margins of the device

    Sample selection method for arbitrary fading emulation using mode-stirred chambers

    Get PDF
    Mode-stirred chambers (MSC) consist on one or more resonant cavities coupled in some way in order to allow the measurement of different antenna parameters such as antenna efficiency, correlation, diversity gain or MIMO capacity, among others. In a single-cavity mode-stirred chamber, also known as a reverberation chamber (RC), the environment is isotropic and the amplitude of the signal is Rayleigh distributed. Real environments, however, rarely follow an isotropic Rayleigh-fading scenario. Previous results have shown that a Rician-fading emulation can be obtained via hardware modification using an RC. The different methods lack from an accurate emulation performance and are strongly dependent upon chamber size and antenna configurations. With the innate complexity of more-than-one cavity MSC, the coupling structure generates sample sets which are complex enough so as to contain different clusters with diverse fading characteristics. This paper presents a novel method to accurately emulate a more realistic Rician-fading distribution from a Rayleigh-fading distribution by selecting parts of the sample set that forms different statistical ensembles using a complex two-cavity multi-iris-coupled MSC. Sample selection is performed using a genetic algorithm. Results demonstrate the potential of MSCs for versatile MIMO fading emulation and OTA testing. The method is patent protected by EMITE Ing.This work was supported in part by the Spanish National R&D Programme through TEC2008-05811 and by Fundación Séneca, the R&D coordinating agency for the Region of Murcia (Spain) under the 11783/PI/09 project

    Emulation of MIMO nonisotropic fading environments with reverberation chambers

    Get PDF
    Some recent publications have extended the emulating capabilities of reverberation chambers. While polarization imbalance has been removed and Ricean-fading environments are now properly emulated, these chambers are still limited to isotropic nonline of sight (NLOS) scattering. By controlling the power received, number of resolvable multipath components (MPC), angular spread (AS), and angle of arrival (AoA), the emulation of real-propagating environments with both isotropic and nonisotropic scattering are demonstrated in this letter using a reverberation chamber with several multiple-input–multiple-output (MIMO) arrays.This work was supported in part by the Fundación Séneca, the R&D unit of the Autonomous Region of Murcia (Spain) under project references TIC-TEC 07/02-0005 and by the Spanish National R&D Programme through TEC2007/63470/TCM

    Effect of user presence on receive diversity and MIMO capacity for rayleigh-fading channels

    Get PDF
    The effects of the presence of the user on multipleinput– multiple-output (MIMO) performance for wireless communications systems is investigated through measurements in a reverberation chamber. Measured results have demonstrated that despite a decrement on the envelope correlation coefficient, a degradation of both diversity gain and MIMO capacity are expected when the user is present. While the validity of the correlation coefficients for predicting MIMO performance is limited in the presence of the user, the effects have also been found to be strongly dependent upon frequency, antenna topology, and user characteristics.This work was supported in part by the Fundación Séneca, the R&D coordinating unit of the Autonomous Region of Murcia (Spain) under Projects 2I05SU0033 and TIC-TEC 06/01-0003

    A case-based reasoning system for recommendation of data cleaning algorithms in classification and regression tasks

    Get PDF
    Recently, advances in Information Technologies (social networks, mobile applications, Internet of Things, etc.) generate a deluge of digital data; but to convert these data into useful information for business decisions is a growing challenge. Exploiting the massive amount of data through knowledge discovery (KD) process includes identifying valid, novel, potentially useful and understandable patterns from a huge volume of data. However, to prepare the data is a non-trivial refinement task that requires technical expertise in methods and algorithms for data cleaning. Consequently, the use of a suitable data analysis technique is a headache for inexpert users. To address these problems, we propose a case-based reasoning system (CBR) to recommend data cleaning algorithms for classification and regression tasks. In our approach, we represent the problem space by the meta-features of the dataset, its attributes, and the target variable. The solution space contains the algorithms of data cleaning used for each dataset. We represent the cases through a Data Cleaning Ontology. The case retrieval mechanism is composed of a filter and similarity phases. In the first phase, we defined two filter approaches based on clustering and quartile analysis. These filters retrieve a reduced number of relevant cases. The second phase computes a ranking of the retrieved cases by filter approaches, and it scores a similarity between a new case and the retrieved cases. The retrieval mechanism proposed was evaluated through a set of judges. The panel of judges scores the similarity between a query case against all cases of the case-base (ground truth). The results of the retrieval mechanism reach an average precision on judges ranking of 94.5% in top 3, for top 7 84.55%, while in top 10 78.35%.The authors are grateful to the research groups: Control Learning Systems Optimization Group (CAOS) of the Carlos III University of Madrid and Telematics Engineering Group (GIT) of the University of Cauca for the technical support. In addition, the authors are grateful to COLCIENCIAS for PhD scholarship granted to PhD. David Camilo Corrales. This work has been also supported by: Project Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrícolas del departamento del Cauca soportado en entornos de IoT financed by Convocatoria 04C-2018 Banco de Proyectos Conjuntos UEES-Sostenibilidad of Project Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca, ID-3848. The Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)
    corecore